concept mention
A Rule Based Solution to Co-reference Resolution in Clinical Text
Chen, Ping, Hinote, David, Chen, Guoqing
Objective: The aim of this study was to build an effective co-reference resolution system tailored for the biomedical domain. Materials and Methods: Experiment materials used in this study is provided by the 2011 i2b2 Natural Language Processing Challenge. The 2011 i2b2 challenge involves coreference resolution in medical documents. Concept mentions have been annotated in clinical texts, and the mentions that co-refer in each document are to be linked by coreference chains. Normally, there are two ways of constructing a system to automatically discover co-referent links. One is to manually build rules for co-reference resolution, and the other category of approaches is to use machine learning systems to learn automatically from training datasets and then perform the resolution task on testing datasets. Results: Experiments show the existing co-reference resolution systems are able to find some of the co-referent links, and our rule based system performs well finding the majority of the co-referent links. Our system achieved 89.6% overall performance on multiple medical datasets. Conclusion: The experiment results show that manually crafted rules based on observation of training data is a valid way to accomplish high performance in this coreference resolution task for the critical biomedical domain.
Boosting Biomedical Concept Extraction by Rule-Based Data Augmentation
Shao, Qiwei, Mo, Fengran, Nie, Jian-Yun
Document-level biomedical concept extraction is the task of identifying biomedical concepts mentioned in a given document. Recent advancements have adapted pre-trained language models for this task. However, the scarcity of domain-specific data and the deviation of concepts from their canonical names often hinder these models' effectiveness. To tackle this issue, we employ MetaMapLite, an existing rule-based concept mapping system, to generate additional pseudo-annotated data from PubMed and PMC. The annotated data are used to augment the limited training data. Through extensive experiments, this study demonstrates the utility of a manually crafted concept mapping tool for training a better concept extraction model.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Italy (0.04)
- (3 more...)
Enhancing Textbooks with Visuals from the Web for Improved Learning
Singh, Janvijay, Zouhar, Vilém, Sachan, Mrinmaya
Textbooks are one of the main mediums for delivering high-quality education to students. In particular, explanatory and illustrative visuals play a key role in retention, comprehension and general transfer of knowledge. However, many textbooks lack these interesting visuals to support student learning. In this paper, we investigate the effectiveness of vision-language models to automatically enhance textbooks with images from the web. We collect a dataset of e-textbooks in the math, science, social science and business domains. We then set up a text-image matching task that involves retrieving and appropriately assigning web images to textbooks, which we frame as a matching optimization problem. Through a crowd-sourced evaluation, we verify that (1) while the original textbook images are rated higher, automatically assigned ones are not far behind, and (2) the precise formulation of the optimization problem matters. We release the dataset of textbooks with an associated image bank to inspire further research in this intersectional area of computer vision and NLP for education.
- North America > United States (0.68)
- Europe > Portugal (0.14)
- Europe > Denmark (0.14)
- Education (1.00)
- Government (0.93)
- Materials > Chemicals > Industrial Gases > Liquified Gas (0.68)
- (2 more...)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.86)
Beyond Word Embeddings: Learning Entity and Concept Representations from Large Scale Knowledge Bases
Shalaby, Walid, Zadrozny, Wlodek, Jin, Hongxia
Text representations using neural word embeddings have proven effective in many NLP applications. Recent researches adapt the traditional word embedding models to learn vectors of multiword expressions (concepts/entities). However, these methods are limited to textual knowledge bases (e.g., Wikipedia). In this paper, we propose a novel and simple technique for integrating the knowledge about concepts from two large scale knowledge bases of different structure (Wikipedia, and Probase) in order to learn concept representations. We adapt the efficient skip-gram model to seamlessly learn from the knowledge in Wikipedia text and Probase concept graph. We evaluate our concept embedding models on two tasks: 1) analogical reasoning, where we achieve a stateof-the-art performance of 91% on semantic analogies, 2) concept categorization, where we achieve a state-of-the-art performance on two benchmark datasets achieving categorization accuracy of 100% on one and 98% on the other. Additionally, we present a case study to evaluate our model on unsupervised argument type identification for neural semantic parsing. We demonstrate the competitive accuracy of our unsupervised method and its ability to better generalize to out of vocabulary entity mentions compared to the tedious and error prone methods which depend on gazetteers and regular expressions. In this paper, we use the terms "concept" and "entity" interchangeably. Hongxia Jin Samsung Research America 665 Clyde Avenue, Mountain View, CA 94043, USA Email: hongxia.jin@samsung.com 2 Walid Shalaby et al. Figure 1 Integrating knowledge from Wikipedia text (left) and Probase concept graph (right). Local concept-concept, concept-word, and word-word contexts are generated from both KBs and used for training the skip-gram model.
- North America > United States > California > Santa Clara County > Mountain View (0.24)
- North America > United States > New York > New York County > New York City (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- (10 more...)
- Information Technology > Communications (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.90)
- Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.69)
Bootstrapping Distantly Supervised IE Using Joint Learning and Small Well-Structured Corpora
Bing, Lidong (Tencent Inc.) | Dhingra, Bhuwan (Carnegie Mellon University) | Mazaitis, Kathryn (Carnegie Mellon University) | Park, Jong Hyuk (Carnegie Mellon University) | Cohen, William W. (Carnegie Mellon University)
We propose a framework to improve the performance of distantly-supervised relation extraction, by jointly learning to solve two related tasks: concept-instance extraction and relation extraction. We further extend this framework to make a novel use of document structure: in some small, well-structured corpora, sections can be identified that correspond to relation arguments, and distantly-labeled examples from such sections tend to have good precision. Using these as seeds we extract additional relation examples by applying label propagation on a graph composed of noisy examples extracted from a large unstructured testing corpus. Combined with the soft constraint that concept examples should have the same type as the second argument of the relation, we get significant improvements over several state-of-the-art approaches to distantly-supervised relation extraction, and reasonable extraction performance even with very small set of distant labels.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Iceland > Capital Region > Reykjavik (0.04)
- (2 more...)
- Overview (0.34)
- Research Report > Promising Solution (0.34)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.41)